Multi-scale and Multi-model Integratio in Chinese Spoken Docume

نویسندگان

  • Wai-Kit LO
  • Helen MENG
چکیده

This paper describes our attempt to combine the relative merits of different indexing units (scales) and different retrieval models to improve performance in Chinese spoken document retrieval. Our study includes indexing units from three scales: words, character bigrams and syllable bigrams. We also include two different retrieval models: the HMM-based model and the vector space model (VSM). Our retrieval task is based on the TDT-2 Mandarin collection news text is used to retrieve relevant Mandarin audio. We experimented with different scales and retrieval models. The HMMbased model retrieves better at the word scale (mAP=0.566). For the VSM, better performance is obtained at the character bigram scale (mAP=0.562). We proceeded with a series of integration experiments where the ranked retrieval lists from different runs are combined by rank-based re-scoring. The best retrieval performance (mAP=0.591) is achieved when we integrate the HMMword and VSM-character configurations. These results suggest that retrieval based on different scales and different models capture different kinds of knowledge, which can be integrated to improve retrieval performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Model for Vertical Wells with Multi-stage Horizontal Hydraulic Fractures in Water Flooded Multilayer Reservoirs

For the characteristics of horizontal fractures in shallow low-permeability oil layers after hydraulic fracturing in multilayer reservoirs, horizontal fractures are taken equivalent to an elliptical cylinder with the reservoir thickness using the equivalent permeability model; then, upon the elliptic seepage theory, the seepage field which has led by a vertical well with horizontal fractures is...

متن کامل

Information fusion for monolingual and cross-language spoken document retrieval

of thesis entitled: Information fusion for monolingual and cross-language spoken document retrieval Submitted by LO Wai-Kit for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in October 2002 Spoken document retrieval (SDR) is an important technique that enables relevant information to be searched from spoken data archives. With the advent of Internet and multimedia te...

متن کامل

Multi-scale-audio indexing for translingual spoken document retrieval

MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) ar...

متن کامل

Multi-scale and multi-model integration for improved performance in Chinese spoken document retrieval

This paper describes our attempt to combine the relative merits of different indexing units (scales) and different retrieval models to improve performance in Chinese spoken document retrieval. Our study includes indexing units from three scales: words, character bigrams and syllable bigrams. We also include two different retrieval models: the HMM-based model and the vector space model (VSM). Ou...

متن کامل

A New Compromise Decision-making Model based on TOPSIS and VIKOR for Solving Multi-objective Large-scale Programming Problems with a Block Angular Structure under Uncertainty

This paper proposes a compromise model, based on a new method, to solve the multi-objective large-scale linear programming (MOLSLP) problems with block angular structure involving fuzzy parameters. The problem involves fuzzy parameters in the objective functions and constraints. In this compromise programming method, two concepts are considered simultaneously. First of them is that the optimal ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002